Information theoretic bounds for Compressed Sensing 

Shuchin Aeron, Venkatesh Saligrama and Manqi Zhao* 



Abstract 

In this paper we derive information theoretic performance bounds to sensing and reconstruction of 
sparse phenomena from noisy projections. We consider two settings: output noise models where the 
noise enters after the projection and input noise models where the noise enters before the projection. We 
consider two types of distortion for reconstruction: support errors and mean-squared errors. Our goal is 
to relate the number of measurements, m, and SNR, to signal sparsity, k, distortion level, d, and signal 
dimension, n. 

We consider support errors in a worst-case setting. We employ different variations of Fano's inequality 
to derive necessary conditions on the number of measurements and SNR required for exact reconstruction. 
To derive sufficient conditions we develop new insights on max-likelihood analysis based on a novel 
superposition property. In particular this property implies that small support errors are the dominant 
error events. Consequently, our ML analysis does not suffer the conservatism of the union bound and leads 
to a tighter analysis of max-likelihood. These results provide order-wise tight bounds. For output noise 
models we show that asymptotically an SNR of 0(log(n)) together with €>(£; log(n/fc)) measurements is 
necessary and sufficient for exact support recovery. Furthermore, if a small fraction of support errors 
can be tolerated, a constant SNR turns out to be sufficient in the linear sparsity regime. In contrast 
for input noise models we show that support recovery fails if the number of measurements scales as 
o(nlog(n)/ SN R) implying poor compression performance for such cases. 

Motivated by the fact that the worst-case setup requires significantly high SNR and substantial num- 
ber of measurements for input and output noise models, we consider a Bayesian setup. To derive necessary 
conditions we develop novel extensions to Fano's inequality to handle continuous domains and arbitrary 
distortions. We then develop a new max-likelihood analysis over the set of rate distortion quantization 
points to characterize tradeoffs between mean-squared distortion and the number of measurements using 
rate-distortion theory. We show that with constant SNR the number of measurements scales linearly 
with the rate-distortion function of the sparse phenomena. 



1 Introduction 

In this paper we derive information theoretic bounds on the performance of the Compressed Sensing problem, 

N 

Y = GX+— (1) 

VSNR 

where the measurements Y g M mxl , the desired signal X g K™, and the compression (sensing) matrix 

G € R mx ™. The noise N ~ JV(0,I m ), where I m is an identity matrix of size m, is assumed to be a 
Gaussian random vector with independent identically distributed (IID) components. We characterize results 
for both deterministic and stochastic compression matrices G = [gij]. For deterministic, G, the columns, 
gj, are normalized to have unit £2 norm. For the stochastic setting we consider matrices drawn from IID 
(independent identically distributed) Gaussian ensembles. Each component here is assumed to be distributed 

as gij ~ -<V(0, 1/m), i = 1, . . . ,m, j = 1,2, . . . , n. Note that under this normalized sensing matrix scenario, 
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the term SNR also denotes the inverse of the noise variance. We refer to the signal model of Equation (|T 
as the output noise model. In parallel we also consider the input noise model given by, 



where N ~ Af(0,I n ) is a Gaussian random vector with IID components. Evidently the noise here enters 
before the "compression" operator, G, is applied. This model is motivated by fusion problems that arise in 
sensor networks @], where noisy observations are compressed. 

The support of the signal X is denoted by Supp(X) = {j \ Xj ^ 0}. We assume that the cardinality of 
the signal support, |Supp(X)| < k < n. It is often convenient to state and interpret results in terms of the 
sparsity ratio a n = ^. The regime when a n n —^ 1 a > is referred to as the linear regime and the regime 
when a n ^Z^P o is referred to as the sub- linear regime. 

We consider two types of distortions in signal reconstruction, namely, (a) Support distortion, i.e., 
(i(X, X)) = £ Sj=i \I{Xi^o} ~ ^{x ^o}l' wnere j !{■} is the indicator function, (b) Mean squared distor- 
tion, <i(X,X)) = ^||X — X|| 2 = i 2j=i l-^j — Xj\ 2 . These two distortion metrics address two different 
issues in signal recovery. The first metric penalizes solely the support detection part while the second metric 
penalizes both support detection and amplitude estimation. We will now highlight the main contributions 
and results of the paper. 

1.1 Bounds for Exact and Approximate Support Recovery 

In this part we further restrict the signal X to be bounded away from zero by a constant (3 > on its 
support. This is a standard assumption employed by other researchers(see El [Zl E]) since it is impossible 
to identify the support of a signal X from noisy measurements with arbitrarily small non-zero components. 
We derive necessary and sufficient conditions for exact and approximate support recovery for this case 
under both output and input noise models. A central contribution of our work in this setting is that 
we explicitly quantify the required SNR and the number of measurements, m for exact support recovery. 
For the output noise model we show that the minimum SNR required for support recovery is f2(log(n)) 
regardless of m. In addition for this minimum SNR level, the number of measurements, m, must scale as 
fi(fclog(n/fc)) to guarantee exact support recovery. Furthermore, we derive sufficient conditions and show 
that with SNR = 0(log(n)) and m = f2(fclog(n/fe)) the maximum-likelihood decoder can exactly identify 
the signal support with high probability. These results are depicted in Table [T] While not depicted in this 
table it is interesting to consider what happens as SNR increases. The bounds derived in this paper show 
that we cannot get significant improvement in m unless SNR is scaled substantially (as a fractional power 
of n). We also derive conditions for support recovery for input noise models. Here our necessary conditions 

say that if m — o (^g^pf^J then recovery would be impossible. Evidently, either the SNR or the number 
of measurements must scale linearly with n to ensure support recovery. Thus either we must operate in an 
essentially noiseless regime or forsake all compression. We also extend our results to approximate support 
recovery. Here a tradeoff between the number of measurements, SNR and support errors for different sparsity 
ratios. These tradeoffs are summarized in Column 2 of Table [2] An interesting aspect of these results is 
that a constant SNR is sufficient if we could tolerate a constant fraction of errors in the support recovery. 
To establish the necessary conditions we use Fano's inequality and its variations [S]- For deriving sufficient 
conditions we analyze the performance of the Maximum-Likelihood (ML) estimator based on a novel insight 
that every large support error event is essentially contained in the union of single support error events. 
This leads to a sharp bound that is order-wise optimal. Our necessary and sufficient conditions for different 
sparsity levels require similar scaling of SNR, and the number of measurements(see Table [T]). 

Related Literature- The necessary condition that SNR = f2(log(n)) irrespective of the number of 
measurements was first reported by the authors in |10j . This paper extends these results to include necessary 
conditions on the number of measurements. Similar conditions have also been reported by Fletcher et. al. 
[llj but due to the constraints imposed on the signal space — the signal is limited to have small amplitude 
variations on its support elements — their conditions are conservative (see discussion in [5]) for our setup. 
Necessary conditions have also been derived by Wainwright [5]. When the bounds of [5] (see Theorem 2 




(2) 
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EXACT SUPPORT RECOVERY (Output Noise Model) 




Linear Sparsity 
< a = a n = £ 


Sub-Linear Sparsity 

On = £ = n" 7 , 7 > 


Necessity (this paper) 


m = f2(n) 


5iVi? = n( lo s ( 2 n) ) 
m = fi(fclog£) 


Sufficiency (this paper) 


SNR = 32 [o ^ 2n) 
to = 6nH 2 (2a), a < 0.04 


SWfl = 32 1 ° s 2 (2 ™ ) 
to = 6fc log ^ 



Table 1: Summary of fundamental bounds for exact support recovery in the worst-case setting described in Equa- 
tion ([I]). /3 is the minimum absolute value of the signal X on its support set; H2(-) denotes the binary entropy 
function; k is the maximum allowable cardinality (sparsity) of the support of X; a is the maximum sparsity ratio 
and; 1/SNR is the noise variance in each noise dimension. The necessary conditions are stated for arbitrary (not 
necessarily IID) matrices, G, such that the marginal distribution of each component has zero mean and variance 
1/m. The sufficient conditions are stated for the case when each element of G is drawn IID ~ A/"(0, — ). 



in [5J) are applied to our setup, it implies that the number of measurements scale as f2(log(n)), which is 
conservative. In addition [6], primarily imposes conditions on the number of measurements but does not 
impose separate bounds on SNR. In contrast we show that unless SNR scales as f2(log(n)) support recovery is 
impossible regardless of m. Furthermore, for SNR = 0(log(n)) we show that to must scale as Q(felog(n/fc)). 
Sufficient conditions for support recovery for output noise models has been described in [6l [12j HU [7j |8] 
as well. Nevertheless, these upper bounds are also significantly weaker than that appearing here. Both 
Wainwright [B] and Akcakaya et. al. [H] use union bounding to derive error bounds for exact recovery. 
Union bounds are generally conservative and results in requiring significantly high SNR, i.e. significantly 
low admissible noise variance (see for instance, Theorem 1 in [6j). The sufficient conditions of Fletcher et. al. 
[TT] is based on Greedy Basis Pursuit algorithm. However, their analysis, as described earlier, constrains the 
signals, X, to have small amplitude variations on its support elements and when applied to our output noise 
setup is conservative (see again discussion in [5j. While [13, 12] derive some results for approximate support 
recovery, the achievable region in terms of number of measurements and SNR as a function of achievable 
distortion is implicitly stated and is therefore not comparable to the results presented here. 



1.2 Rate distortion bounds 

In the second part of the paper, we consider sparse Bayesian signal models for X to fully exploit the power of 
information theoretic methods. This naturally leads us to characterizing necessary and sufficient conditions 
in terms of the rate distortion function. 

We first consider arbitrary pointwise distortion metrics, i.e., i(Z(X, X) = i . d(Xj, Xj), j = 1, 2, . . . , n, 

where Xj , Xj are the j-th components of X, X respectively. For deriving necessary conditions we develop a 
new modified Fano's inequality that provides us with a worst case lower bound to the probability of error in 
reconstruction to within a distortion ^d(X, X) < d in terms of the scalar rate distortion function Rx(do) 
and mutual information I(X, Y), between X and Y. This bound is of independent interest since it can be 
applied to non-sparsifying distributions as well. In particular we show that, 

^x,x)>^>^ w - c "-^ (X;Y) 



n J Rx(d ) 

for some small constant Co < Rx{do). 

For deriving sufficient conditions we compute upper bounds to the probability of error subject to a 
tolerable distortion based on the so called covering property of rate distortion theory. In particular we 
formalize a minimum distance decoder (distance measured in terms of given distortion metric) over the set 
of rate distortion quantization points. We then specialize our bounds to the mean squared distortion metric. 
The results are summarized in the second column of Table [2j Our necessary and sufficient conditions for the 
number of measurements and SNR match within a constant factor for the linear sparsity regime. 
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APPROXIMATE SUPPORT RECOVERY - Sufficient conditions (Linear Sparsity Regime) 


Support Error Distortion \ J2"=i Vix^o} - I {x il tO)\ ^ d o 


Mean Squared Distortion i||X - Xj| 2 < d 


SNR - Q ^(2a„d )^ and m _ n ( nH2 (2a n )) 


m = Q(nRx(do/2)), for SNR = fl{ Rx ^ lo/2) ). 



Table 2: The first column describes the achievable rate regions for approximate support recovery. Support error 
distortion do is the fraction of the true support in error. The second column describes the results for the Bayesian 
set-up in terms of the scalar rate distortion function for varying mean squared distortion. Here the distortion do is 
the desired mean-squared-distortion. 



Related Literature- Rate distortion analysis has been reported in [14j [15] for mean squared error and 
for a Gaussian source. In contrast our expressions apply to general distortion measures and to any source for 
which a rate distortion function is defined. These results appeared in our preliminary work |16j . In addition 
the results in [14] for the case when G is random are only proven for k = 1. In contrast in this paper we 
prove results for general k — an. For a fixed problem size (n, k, m) the results in [15] are stated in terms of 
a critical SNR threshold. This makes the expressions implicit in the number of measurements required as a 
function of signal sparsity and therefore the scaling laws are unclear. 

The rest of the paper is organized as follows. In Section [2] we present our problem set-up. Here the 
notion of Sensing Capacity is introduced to study the asymptotic behavior of both the output noise and 
input noise models. Section [3] presents necessary and sufficient conditions for support recovery. In Section [4] 
we consider the Bayesian setup and derive bounds for signal recovery under arbitrary distortion measures. 
This requires us to generalize the traditional Fano's inequality to general (average) distortion measures and 
continuous signal spaces. We also provide extensions of Fano's inequality for discrete signal spaces with 
Hamming distortion in reconstruction. Section |4.2| presents a novel ML upper bound for signal recovery to 
within a given squared distortion level. Using these results, in Section [5] we evaluate bounds for SNR and 
number of measurements required to reconstruct X to different levels of distortion level for output and input 
noise models. We then comment on the differences between worst-case and Bayesian setups. 

2 Problem Set-up 

We consider output and input noise models described in Equations ([!]) and The sparsity of X is modeled 
both deterministically and stochastically as is the compression matrix G. We use bold- face to denote vectors 
and matrices, while regular font is used to denote scalar components of the vector and matrices. The jth 
component of a vector X is denoted Xj, the jth column of a matrix G is denoted gj and its ij-th component 
is denoted as g^. The cardinality of a set S is denoted by \S\. Given a set S C {1, 2 . . . , n}, Xg denotes the 
signal, X, restricted to the set of components indexed by S. Similarly, we denote by Gs the matrix formed 
from columns indexed by S. We use Pr(-) and P(-) interchangeably to denote the probability of an event. 

Non- Random Sparsity Signal Model: We say that C K.™ is a family of k-sparse sequences if for 
every X S a^ k ' , the support of X is smaller than or equal to k. Formally, let 

Supp(X) = {j | Xj ? 0} 

Then is a family of k-sparse sequences if, 

S«={X:|Supp(X)|<fc} (3) 

We will refer to the ratio, a n = k/n as the sparsity ratio. We will often work with subsets of C S^, 
These are sequences whose minimum absolute value is bounded away from zero by a constant j3 > 0: 

sf } ={Xe W 1 : |Supp(X)| < k, \Xj\ > /?, Vj e Supp(X)} (4) 

We will see when we derive necessary conditions that /? > is necessary for support recovery. This is 
mainly because it is impossible to determine the support of a signal with arbitrarily small components under 
noisy measurements. This condition is also assumed by other authors [171 17] . 
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We denote by S fc C the set consisting of exactly k-sparse sequences. 

3 fe = {X:|Supp(X)|=fc} (5) 

This distinction is important and the reader should keep this in mind. The subset C S fc is analogously 
defined. 

Bayesian signal model: We say that a prior distribution on X is an asymptotically sparsifying distribu- 
tion if for sufficiently large k, n the distribution concentrates all the measure on a subset of S^. In this 
paper we will provide general results for arbitrary sparsifying priors and explicit bounds for the following 
Gaussian mixture model, namely, each component of the signal is distributed as: 

X t ~ P x = aAf(fj,i,al) + (l-a)Af(fj, ,a^) 

The corresponding n dimensional distribution of X is realized as a product measure on K™. As an example 
note that for jix = 1,/io = and cr\ = uq ~~ * this mixture model asymptotically models binary sparse 
sequences with sparsity highly concentrated around k — an. The main reason for using a Bayesian signal 
model is that it lends itself to information theoretic tools and allows us to study the tradeoffs between the 
number of measurements at different distortion levels for a given SNR. 



2.1 Sensing Capacity 

The nature of the results developed in the paper are asymptotic, namely, we let the signal dimension n and 
the sparsity k each approach infinity at different rates and derive bounds on the number of measurements, 
to, and SNR, for exact/approximate reconstruction of X. In this context we also derive bounds for to 
and SNR for reconstruction of functions Z = /(X) of X. For instance, we consider functions /(•) that 
indicate the support or sign function of X. We denote X(Y) (resp. Z(Y)) as an estimate of X (resp Z) 
based on the observation Y. The distortion between the estimate Z and the estimate Z is denoted by 
i<i(Z, Z) = i J2j d{Zj, Zj) for some scalar distortion metric d(-, ■). 

The sensing capacity involves determining the largest ratio "" g ^""" > = " - , required for reconstruction 
to within a desired distortion. To build motivation on this ratio, consider again the maximum sparsity ratio 
a n = f . The cardinality of the support set is 2 l ° ei ^=° ("))) = 0(2 nH = (*/")), where H 2 (-) denotes the binary 
entropy function [18j . The term nii 2 [k j n) is a measure of the entropy of the support set, i.e., the average 
number of bits required to uniquely encode the support set. The sensing capacity measures the number 
of source bits/measurement required for accurate decoding to a desired distortion level from compressed 
measurements . 

If sensing capacity is a constant, it implies that the number of measurements required is proportional to 
the source entropy. On the other hand if the sensing capacity approaches zero, it means that the number 
of measurements must increase significantly faster than the source entropy. This also implies that the 
compression operator G offers poor compression. 

We next define the e-sensing capacity for a signal X of dimension n and with maximum sparsity k. We 
use S to denote a suitable subset of admissible signals, X. This could be any subset such as those described 
in Equations Q and 

ClASNR,a n ,d ) = Cl JSNR,k,d ) = sup ( ng ( fc / n ) . Eg sup p( -d(Z,Z) < d |G.X) > 1 - e 1 (6) 

m I m xgh V" /J 

where the probability is over N. Note that one may choose a less conservative notion by interchanging the 
order of max XgH {k) and Eg: 

Cl (SNR, a n , d ) = Cl (SNR, k, d ) = sup ( nH{ - k l n ^ ■ sup Eg P ( -d(Z, Z) < d \G, x) > 1 - el (7) 

For the Bayesian set-up the sensing capacity is denned as, 

Cl e (SNR, a n ,d Q ) = Cl e (SNR, k, d ) = sup ( nH( - k / n ^ ■ Eg . x P f-d(Z, Z) < do|G, x) > 1 - el (8) 
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where the probability is again over N. Since 

E G sup P ( -d(Z,Z) > d |G,X ) > sup E G P f -d(Z, Z) > d |G, X ) > E G X P ( -d(Z, Z) > d |G, X 
This implies that 

C^SNR, k, d Q ) < C 2 n JSNR, k, d Q ) < C 3 n JSNR, k, do) (9) 

This chain of inequalities implies that an upper bound for the Bayesian sensing capacity is an upper bound 
for the other notions as well. A lower bound for the worst-case sensing capacity (Equation (|6|) is a lower 
bound for the other notions as well. To derive the lower bound to sensing capacity we derive an upper bound 
on the probability of error using Maximum Likelihood (ML) analysis that uniformly holds for all X € SW. 
For this reason we primarily focus on the notion of Equation ^ and Equation Q . To avoid cumbersome 

notation we drop the superscript denoting the different notions, namely, we employ C„. e (-) = C % n e (-), since 
it is usually clear from the context. 

We propose an asymptotic definition for sensing capacity by letting n — > oo as follows. 

Definition 2.1. Let {a n }, be any sequence of sparsity ratios where k is either fixed or approaching infinity 
linearly or sub-linearly with n. Sensing capacity is the supremum over all the sensing rates such that as the 
signal dimension, n, the number of measurements, m, and the dimension of the (possibly) random sensing 
matrix, G € R mx ™, approaches infinity, there exists a sequence of estimators Z such that the probability that 
the distortion, ^d(Z, Z) is below do approaches one. Formally, 

C(SNR,{a n },do) = lim limsup C n ^(SNR, a n , do) 

m,n 

where we explicitly denote the dependence of capacity on SNR, sparsity sequence a n , and distortion level do- 
Ill the following we begin by considering the case of exact support recovery for the family of fc-sparse 
sequences. 



3 Support Recovery: Worst-Case Setting 

In this section we consider the problem of exact support recovery under the models of Equations (fll) and (|2| 
for the non-random parameter set, 1 given by Equation my. Suppose, X is the estimate for X based on 
measurements Y. Recall that by exact support recovery we mean that, 

P e = E G sup P{Supp(X) ^ Supp(X) | X. G} — > 

where the probability is over N. In this context one may also talk about sign pattern recovery, 

P e = E G sup P{Sgn(X) ^ Sgn(X) | X, G} — ► 

Here the Sgn function is described by 

( 1, if X > 
SgnpO = I -1, if X < 
[ 0, if X = 

It is easy to see that the results derived below also hold for sign pattern recovery with appropriate adaptation 
of the proof methodology and the subsequent results only differ by constant factors and in particular does 
not change the resulting scaling laws. Therefore we will focus on the problem of support recovery. For this 
set-up following are our main results for the output and input noise models. 

Theorem 3.1 (Output Noise ModekNecessity). Consider the output noise model of Equation £7p with 
the signal set defined by Equation Q). Let G be any matrix such that the marginal distribution for each 
component has zero mean with variance — . Then there exists no estimator that can recover the support if 
SNR — o(log(n)). Furthermore, for SNR = 0(log(?i)) support recovery is impossible if m = o(k\og(n/k)). 



G 




Figure 1: Figure illustrating the intuition behind our ML analysis for support recovery using binary X as an example. 
In the Figure Xo is the true signal that is taken to be the origin. Support error events with support errors more than 
1 are contained in union of events with support error of 1 before the sensing/compression operator G is applied. This 
property is essentially preserved under the transformation by G if the minimum singular value of matrix G is well 
behaved. 



The proof can found in Section 3.1.2. Note that we do not have to assume that the components of the 
sensing matrix are distributed IID. The proof of the theorem also shows that the number of measurements 
can not be decreased significantly unless SNR scales as n 1 for some 7 > 0. It is interesting to point out that 
in contrast to the noiseless case where 2k + 1 are required for signal reconstruction, the presence of even 
small noise (namely, variance scaling as l/log(n)) significantly alters this fundamental bound. 

The following result characterizes a partial converse of Theorem |3.1| 



Theorem 3.2 (Output Noise ModehSufficiency). Suppose the sensing matrix, G, in Equation |7p is 
drawn from an IID Gaussian ensemble with each component gij ~ jV(0, — ) and the signal set is given by 
Equation Q). If m = n(nfla(^)) = fi(felog(n/Jfc)) and SNR = f2(log(n)) then the ML algorithm can exactly 
recover the support with high probability for all \ = ct n < .04. Alternatively, for any sensing matrix G with 
m > 2k + 1 and SNR — 0(log(n)) the ML algorithm can recover the support with high probability, if the 

minimum singular value, o~G >m i a — min XeS {2fc} ^^jjSl 2 is bounded way from zero. 



Remark 3.1. Note that Theorem 3.2 for the deterministic case requires <7G,min to be bounded away from 
zero. One may question whether this requirement is fundamental. We argue that this is so here. Note that 
the optimal decoder must compare different signals with supports smaller than k and pick the most likely. If 
<7G,min is arbitrarily small, it implies that there are k columns which are badly conditioned. In the presence 
of noise a worst-case signal emanating from these k sparse columns will go virtually undetected relative to 



The proof for the deterministic and stochastic sensing matrices appear in Sections |3.2| and |3.3| A 
geometric intuition of the proof for deriving the sufficient condition is shown in Figure (HTfor binary X. 
The proof is based on the fact that for Gaussian noise N, before the compression operator G is applied, the 
support errors larger than one are contained in the union of events with support error equal to one. We show 
that this is largely true when the compression is applied as well. The proof for the random Gaussian matrix 
G is based on the deterministic case. It turns out that the sparsity ratio a n < 0.04 controls the singular 
values of the sub-matrix, namely, we can ensure <JG,min > with high probability for sparsity ratios below 
this number. 



7 



Note that we can also state these results in terms of sensing capacity. Formally, given any e > 0, there 
is an n(e) such that for all n > n(e) and any monotonic sequence a n < 0.04, there are positive constants, 
Ci, c 2 , so 

< ci < C„ !£ (log(n),a n ,0) < c 2 

In contrast to the optimistic results for output noise models, we have the following pessimistic result for the 
input noise model whose proof can be found in Section |3.1.1[ 

Theorem 3.3 (Input Noise Model:Necessity). Consider the input noise model of Equation with 
the signal set defined by Equation and G drawn from an IID Gaussian ensemble with each component 

gij ~ 7V(0, 1/m). Let a n be any positive monotonic sequence of sparsity ratios. Then recovery fails if 

/nmax(log(n),log(^-))\ ,. , . ... 

m = o I pisNR — J ' Alternatively, the sensing capacity is zero. 

This says that for the input noise model one cannot expect meaningful compression in a noisy regime. 
To ensure support recovery either the SNR has to scale linearly with n, which implies essentially a noiseless 
regime, or the number of measurements must scale linearly with n with any meaningful level of noise. This 
calls into question the sensor network motivated compression schemes such as those presented in |3] where 
the raw noisy measurements are randomly projected and transmitted to a fusion center. 



3.0.1 Achievable Distortion Regions for Support Recovery 

In this section we will describe results for approximate support recovery, namely, we allow some distortion in 
support recovery. An important implication of our result is that in the constant sparsity regime it is sufficient 
for SNR to be a constant independent of n if we accommodate a constant fraction of support errors. We 
account for the support distortion as 

1 " 

where, is the indicator function. 

Theorem 3.4. Consider the observation model of Equation |7j) with G drawn from a Gaussian ensemble. 

Let X € and let d be as described above. Lt follows that if SNR > 2 ^ 2 " — - and m > 6nH 2 (2^) the 

probability of support error greater than distortion do goes to zero. Consequently, it follows that for support 
recovery with constant distortion, do, in the linear sparsity regime, i.e, a n = k/n > a > 0, it is sufficient 
for the SNR to be a constant independent of the signal dimension n. 

Proof. The proof is based on the proof of Theorem |3.2| and we refer the reader to the appendix. □ 

Note that Theorem |3.4| only trades off SNR with the distortion. However one would expect that with 
allowable distortion in support recovery it is possible to tradeoff number of measurements with distortion. 
In the following sections we will develop this tradeoff of number of measurements with the rate-distortion 
function by considering a Bayesian set-up. The main reason why this tradeoff is possible in a Bayesian set-up 
is due to the fact that before we analyzed a worst case set-up while in Bayesian case we analyze an average 
case scenario and it turns out that on an average the number of measurements can indeed be traded off with 
distortion. 



3.1 Proof of Theorems |3.3| and [371] Necessary Conditions 

We derive necessary conditions based on lower bounds to probability of error. As we pointed out in Equa- 
tion (|9| putting a suitable measure on the signal X can provide necessary conditions for the worst-case setup. 
This motivates employing different versions of Fano's Lemma to establish the results. The standard version 
of the lemma appears in [18] and we repeat it here for the sake of completion: 
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Lemma 3.1. Suppose X is a finite discrete set and X G X is distributed uniformly over this finite set. 
Let the observation Y be distributed according to the conditional distribution P(Y|X), with X G X . let X(Y) 
denote the estimate of X given Y. Then the probability of error in estimating X from Y is lower bounded 

by, 

P(X(Y)^X)>1 log(W _ 1} 

where I(X; Y) denotes the mutual information between X and Y. 

An alternate version of Fano's lemma stated in [19] provides a lower bound for N-axy hypothesis testing. 

Lemma 3.2. Let (y,B) be a a— field and let Pi,...,Pjy be probability measures on B thought of as 
induced by N hypotheses {1, 2, . . . , N}. Denote by 9{y) the estimator of the measures defined on y . Then 

max P s: (%) f P 4 ) > ± f> 4 (*(y) ^P.) > 1 - ^ EiJ + 



Ki<n 



N - ~ log(iV- 1) 



where Pi means the distribution conditioned on the hypothesis i and £)(Pj||Pj-) is £/ie Kullback-Liebler (KL) 



distance between the distributions P, and Pj . 

Note that the use of these Lemmas requires a finite number of hypothesis or discrete alphabets. Therefore, 
in order to use these Lemmas for general fc-sparse sequences X £ we first show that the worst case 

probability of error in support recovery is lower bounded by the probability of error in support recovery for 
X belonging to fc-sparse sequences in {0, /3} . To this end we have the following Lemma. 

Lemma 3.3. Let B^ be the family ofk sparse non-random sequences as defined in Equation Q). Denote 

the conditional distribution o/Y given X as P(Y | X). Let Bm^i = {X G 3^ | Xj = /3, j € Supp(~X.)} be a 

subset o/sj^ consisting of binary valued sequences. Let X denote an estimator for X based on observation 
Y. Then, 

P e|G= min max P{Supp(X) ^Supp(X)|G,X} > min max P(X^X, Xe»jJ } JG,X) 



> min max P(X^X, XG5|^ } } |G,X) (10) 



Proof. See Appendix. □ 
The main idea behind the proofs of the results that follow below is to first lower bound the error probability 



by using Lemma 3.3 and restrict attention to binary sequences. Next w e fu rther restrict the signal class to 

we derive the lower bounds for 



3.2 



a smaller subset of B} of cardinality n. Then, finally using Lemma 
the set of binary sequences. The lower bound thus obtained yields the necessary conditions 



3.1.1 Input Noise Model(Proof of Theorem 3.3) 



From Lemma |3.3| it is sufficient to focus on the case when X belongs to the set of fc-sparse sequences in 
{0,/3}" and any subset of these sequences. We will establish the first part of the Theorem as follows:- Let 
B}q 0i be the subset of r\ < k sparse binary valued sequences. Let Xo G B} ^i, be an arbitrary element 
with support Supp(Xo) = r\ — 1. Next choose n elements Xj, j = 1, 2, . . . , n with support equal to r\ and 
at a unit Hamming distance from X . Denote by the probability kernel Pj, < j < n the induced observed 
distributions. Under the AWGN noise model, for a given G, and a fixed set of elements, Xj, the probability 
kernels are Gaussian distributed, i.e., 

Hi-.Y* Fj = Af ^GXj , -JL) , j = 0, 1, n 
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where S = GG T . Furthermore we have n + 1 hypotheses. Consider now the support recovery problem. 
It is clear that the error probability can be mapped into a corresponding hypothesis testing problem. For 
this we consider 6(Y) as estimate of one of the n + 1 distributions above and we have the following set of 
inequalities. 

1 n 

P e , G = max P X (X ^ X | G) = maxP^(Y) ^ P, | G) > — - £ P,-(0(Y) ± P, | G) 



where we write P e |c to point out that the probability of error is conditioned on G. Applying Lemma 3.2 
follows that the probability of error in exact support recovery is lower bounded by, 

Fe i G — i — 

log(n) 



it 



We observe that under AWGN noise N that, 
D(¥i\\¥j) = SNR(Xi - X i ) T G T S- 1 G(X t - Xj) = SNR(X t - Xj) T V 



/ 




V*(X i -X J ) (11) 



where £ = GG T , G = U[A, 0]V* is the SVD of G with V = [v x , v 2 , v 3 , . . . , v„] = [v rs ]. The last equality 
in Equation (11) follows from straightforward algebraic manipulations. Now by noting that (X^ — Xj) is at 
most a 2-sparse vector with its non-zero entries equal to /3 at some locations q and p, we can further reduce 
the last expression to £>(Pi||Pj) = SNRp 2 YdLi(. v pi ~ v qi) 2 - Now 

using the standard rotational invariance 
properties of IID Gaussian matrices [20] . that its singular vectors are uniformly distributed over a sphere, 
it follows by taking expectations and using symmetry that, 

l 0g(n )_ n ^SNRm _ l 2 
Pe = E G P e | G > 81 ' "+ 1 , " (12) 

log(n) 

Now, the error probability is bounded away from zero by e if the number of measurements scales as follows: 

' [n + 1) log(n) 



m = o 



f3 2 SNR 



To establish the second upper bound we consider the family, 2^ Q ^ of exact k-sparse binary valued sequences 
which form a subset of H} . Following similar logic as in the proof of the first part, for the set of exactly 
fc-sparse sequences, we form the corresponding m hypotheses. Then, 

log(0 - 1) - 7=p D(P.I|Pj) - log2 
P, = E ,, |0 > SiL^ (!3) 

We compute the average pairwise KL distance, 

i E m\\v s ) 

\k) 

k 

= -Lj2 SNR ( X - X') T G T S- 1 G(X - X').tJ (sequences X' at hamming distance 2j from X) 

The equality above follows from symmetry. Again using the standard rotational invariance properties of IID 
Gaussian matrices |20) . the above equation implies that , 

-L £ d(p« = ^£sa^^^ 

U) i,3,i& j= i V J J \3 J 

where the last equality follows from standard combinatorial identity. The proof then follows by noting that 
for large enough value of n, log((^) — 1) > a n nlog 
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3.1.2 Output Noise Model (Proof of Theorem [37T]) 



We will now establish Theorem 3.1 namely, that if SNR = o(log(n)) support recovery is impossible. Fur- 
thermore, if SNR = 0(log(n)) support recovery will be impossible if the number of measurements scales as 
o(k\og(n / k)) . The first part follows from the following Proposition. 

Proposition 3.1 (Output noise model - SNR Bound). For the observation model of Equation |7]) with 
the signal set of Equation j^j) the SNR must scale with 1 ° 2 s ^"- ) for perfect support recovery irrespective of which 
sensing matrix is used. 



Proof. The proof follows along the same lines as the proof of Theorem 3.3 with X = I up to Equation (11 1 



In the Kullback Leibler distance calculation we are now left with the term G T G. Since G is normalized 



its expected value is identity. Therefore, we no longer get the factor n/m in Equation 12 Consequently, 
following the rest of the steps we have that, 2f3 2 SNR > log(n) for exact support recovery. □ 



Next we establish what happens for SNR = 0(log(n)) to prove the second part of Theorem 3.1 First, 
note that if the sparsity, k, grows linearly with the signal dimension, n, there is nothing to prove, since it is 
well-known [T] that the number of measurements must scale at least as 2k + 1 = Q(n) even when there is no 
noise to guarantee support recovery. For this reason we focus on the sub-linear case namely, k — n~ 7 , 7 < 1. 
We consider the subset Hr ^ consisting of strictly /c-sparse sequences taking values in {0, f3} n . From Lemma 
|3.3| we see that it is sufficient to focus on this set. Applying Lemma [3T] with a uniform prior on the support 
set we get 

maxP(X * X|X, G) > P(X * X|G) > 1 - (14) 

where X = 3j ^ C {0, f3} n is the discrete alphabet in which values of X are realized. The first inequality 
follows because the worst-case probability of error is larger than the Bayesian error. 

Note that strictly speaking since we are interested in the support errors, the probability of error events 
and the mutual information term must contain the support of X as the variable but since we are restricting 
ourselves to binary valued sequences X G ^y, knowing the support implies that we know X. 

Now log \X\ = log (?) since there are (g) such hypothesis consisting of all the possible support locations 
with cardinality k. We will now upper bound the mutual information term. It follows that, 

I(X; Y|G) = h(Y\G) - h(Y\X, G) < h(Y) - h(N) < £ h(Y t ) - ^ log 2ne—— 

Wm, / fk/3 2 1 \\ m, .„ 1 , to, , k[3 2 SNR, 
* 2 l0g ( 27re + SNR ) ) - 2 l °^ £ SNR^ = 2 l0g(1 + ^ } 

where h(-) is the differential entropy; (a) follows from the fact that the noise is Gaussian and the chain rule 
together with the fact that conditioning reduces entropy; (b) follows from the fact that Gaussian distributions 



maximizes differential entropy. Now from Equation ( 14 1 it follows that the number of measurements must 
satisfy, 

m > los j£; 1) , 2 M) 

- i og (i + M!M«) + l£g2 



Next unless SNR = f2(log(n)) we know from Proposition |3 . 1 1 that support recovery is impossible. Hence we 
set SNR = log(n), which is the minimum possible. We next establish the theorem by contradiction. To this 
end let the number of measurements scale as m = p n log((?J), with p n — > 0, then, by rearranging the terms 
in Equation ( [l5| we get 

/ kfSNR \ log2 log ((g) -1) 
" { l0 §(G)) ) lQg(C)) - Pn log(O) [ 
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Next note that the expression on the left can be simplified by noting that 

kfi 2 SNR ( 1 

while the expression on the right has the scaling Consequently, if maxim um admissible sparsity, k, 



grows sub-linearly with n then log(H — ^f^^jy) = 6(log(l + Pn 1 )) and Equation (16) can never be satisfied 



since p n — > 0. This shows that for sub-linear cases recovery is impossible if m — o(log(Q))) = o(nH2 (a n )). 

Remark 3.2. Note that unless SNR scales as n b for some S > we will still need the measurements to 
scale as Q{klog(n/k)) to guarantee support recovery. 

3.2 Proof of Theorem 13.21 Deterministic Case 

In this section we derive sufficient conditions for support recovery for the output noise model for any given 
arbitrary deterministic matrix G and for general noise covariance S. For the output noise model of Equation 
0, we assume that each column of the deterministic G is normalized. Subsequently we specialize these 
results to the case when G is chosen from the Gaussian ensemble and with £ = I. 

To simplify the exposition we introduce several new variables. We associate each admissible signal, 
X € by its support, S. We denote by Xg the signal, X, restricted to the set of components indexed by 
S. Similarly, we denote by G5 the matrix formed from columns indexed by S. Since the maximum sparsity 
level is k the number of different support sets is equal to Ylj=x (j) — 1- We index the different support sets 

as S u with lj € X = jo, 1, 2, ... ,Ylj=i (™) — l}- Also we denote by X™ the minimum absolute value of 

the components of the signal X on the support set S u , i.e., X™ n = min{|X,| : j e S u }. Without loss of 
generality we assume that the true signal is X , the support set of the true signal to be So corresponding to 
oj = 0. We denote by Xq j the jth component of the true signal. 

For any w ^ 0, we denote the overlapping support by, So tU1 , false detection by, Sqc ^ and missed detection 
by, So,^, namely, 

Overlap— Sq^ — Sq n S u 
False Alarms— S'o^w = S C fl S u 
Misses- Sq^c = S n 

For a given noise covariance X the ML estimator is given by, 

X= min (Y — GX) T S _1 (Y — GX) 



xe: 



The above ML estimator is hard to analyze. In order to simplify the analysis we will consider a sub-optimal 
ML estimator. To this end consider the set, Clearly, ' C ^p/ 2 - We propose the following sub- 

optimal ML estimator, 

X = arg min ||Y-GX|| 2 (17) 

and report Supp(X) as the final solution. Note that this estimator is sub-optimal since it is prone to more 
errors. To see this note that we consider a larger signal set and we ignore possible noise correlation S 
in our estimator. Consequently, the error probability in detecting the correct support can only be larger 
than the optimal ML estimator. The performance of the relaxed estimator provides an upper bound for the 
performance of ML estimator. Hence, we can write, 

P^<P e |G=P(N: min ||Y - G Slj X s J 2 < min || Y - G So X So || 2 ) (18) 

Note that in the above expression X5 is not the true signal, X , but any other signal whose support is 
identical to that of the true signal. We then have the following result. 
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Lemma 3.4. 



P^|G < P e|G <P(fl)+P(^2) 

where 

£ 1 = {N: min ||Y - G s X s j| 2 < min || Y - G s Xj| 2 } 

u^0,X™«>/3/2 X 



£ 2 = {N:||(G? G So )- 1 G^N|| co >/3/2} 

Proof. First note the following qualitative points. In the event £\ we have replaced the constrained mini- 
mization on the R.H.S. of the inequality in the error event with an unconstrained one. This will simplify the 
subsequent analysis as closed form expressions can be obtained. The event £2 captures the probability that 
the unconstrained minimization in £\ is very far from the constrained one. Here we use the fact that the 
minimum component on the support of the true signal Xo is greater than (3. We also relax our ML estimator 
so that we find a best fit with any signal sharing the same support set, So, as X but with X™ 11 > (3/2. 
Now, denote 

A=\n: min ||Y - G^X S J| 2 < min || Y - G So X So || 2 1 



B^lN: min j| Y - G So X So || 2 = min || Y - G So X|| 2 

{ X£»>/3/2 X J 

Then we have 

P e = V(A) = V(A DB)+ V(A n B) < V(A n B) + ¥{B) 
The Lemma then follows by noting that, 

AHB = Jn: min \\Y-G S ^ S J 2 < min || Y - G So X So || 2 1 n B 

I X™>/3/2,^#0 X™">/3/2 j 

= Jn: . min || Y - G S ^X S J| 2 < min || Y - G So X|| 2 1 = £ , 

j Xg»">/3/2,^0 X j 

and, 

S=Jn: min || Y - G 5o X So || 2 ^ min || Y - G So X|| 2 1 C £ 2 
I x™>/j/2 x I 



□ 



From the above Lemma, it is sufficient to focus on events £\ and £2 separately. The following lemma 
provides a result that considerably simplifies the error event £\ . It turns out that the event £\ is a subset of 
the union of atomic events, namely, 

Lemma 3.5. For m>2k + l, 

n 

£iQ£i= (J (J {N :2N T g 3 X>a G , min \X\ 2 } 

Xe{f)/2-P/2}3 = l 



where, gj is the j — th column of the matrix G and 

CG.min = HI 

\S\< 

where cr m i n (GgGg) denotes the minimum singular values o/G^Gg. 



OG.min = min cr min (GgGs) (19) 

|S|<2fe 
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Proof. See Appendix. □ 
We now have the following Lemma. 

Lemma 3.6. Consider the output noise model for a deterministic matrix G with m > 2k + 1 and N 
distributed as A/"(0, S). The probability of the error event £\ is upper bounded by, 

P(fc) < exp |-< min } exp{log 2n} (20) 

where A m i n (X ) is £/ie minimum eigenvalue value of the matrix S" 1 . 

Proof. See Appendix. □ 

We now have the following Lemma for the error event Ei- Again note that the result applies to any 
matrix G (not necessarily Gaussian). 

Lemma 3.7. For the setup of Lemma \3.6\ we have, 

P(£ 2 ) < exp |-a G min + log2n 

Proof. See Appendix. □ 



By combining Lemmas 3.6 and 3.7 we can prove the deterministic case of Theorem 3.2 We state it as a 



proposition since we will refer to it later. 



Proposition 3.2. Consider the setup of Lemma 3.6 Then for exact support recovery it is sufficient that 

1 641og2n \ 



m > 2k + 1 and SNR = n 



'G,min - 

Proof. From Lemmas |3.6| and |3.7| it follows that for m > 2k + 1, 



2 A min (S- 1 )/3 2 ^i? \ / 2 x^h-^/Psnr 



P e |G < exp j-er Gmin — + log 2n| + exp <j -cr Giluin + log 2n 

< 2 exp I -ct g min — + log 2n 

Therefore for SNR = 2 ■ o" G 2 min 32 l0 f^," n the probability of error P e | G < 2 e - log2 ™. Thus with n -> oo, 
Pe|G g°es to zero as ~. This implies that 5iVi? scaling of O ( c G 2 min ^r — -7 ) is sufficient. □ 



3.3 Proof of Theorem 13.21 Gaussian Case 



We will now focus on sensing matrices, G, drawn from an IID Gaussian ensemble. As in the deterministic 
case we need to bound the probabilities of events, £\ and £2- We will first focus our attention on event £\. 
We point out that the proof for the deterministic case cannot be directly applied. First, note that <7 Gjm i n 



of Equation ( 19 1 is now a random variable. Therefore, we need to average over this random variable in 
computing an upperbound to the probability of events £ 1; £ 2 - A second problem is that in the deterministic 
case we assumed that the £2 norm of each column, gj is deterministically normalized to unity. In the Gaussian 
case only the expected power is normalized to unity. Note also that for the output noise model considered 
in this paper S — I. Therefore A m ; n (S" 1 ) = 1. Following along the lines of the proof of Lemma 3.6 we see 
that, 

f o-l mi „(3 2 SNR) 
m I G) < exp - G - mm ,, \ exp{log2n} 
32max,- ||g,-|| 2 
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We need to now characterize a lower bound for '^'"m . To this end we observe that, 

max j || Sj II 2 ' 



tr. 



2 

G,min 



{l-nf 



Pr ^ Vt^" ^ Pr (^,min >(1 - VY, max|| gj || 2 < 1 + e) (21) 

\ max, ||g,||2 1 + e / J 



> 1 - (Pr(a G , min < (1 - r,Y) + Pr(max ||gj 2 > 1 + e)) 

3 



This implies that we should characterize ctg, min an d max^ ||gj||2 separately. We appeal to the following 
lemma in [2], to characterize <7<3 !m i n . 

Lemma 3.8. Suppose the sparsity is a n = k/n and we consider a function f(q) := \Jnjm C</q + ^2-^2 (<?)^ j 

where i?2(<z) := —9 log 9 — (1 — q)log(l — g). Let G be an m x n matri x dr awn from a Gaussian ensemble 

with gij ~ 7V(0, 1/m). TTjen it follows that CTG.min described in Equation 19 has the following concentration 
property, 

P (<T G ,min < 1 - rf) < 2 exp (- neH2 } a ^ ) t 5l {n,a n ,e) (22) 



where, r) = 2(1 + e)f{2a) + (1 + e) 2 f 2 (2a). 

We consider the following concentration result to characterize maximum power of the columns of G. 

Lemma 3.9. Let G be drawn from an IID Gaussian ensemble with ~ A/"(0, 1/m). Let gj, j 
1, 2 . . . , n &e t/ie columns of G. Then, for any e > 0, it follows that, 

P(max ||gj||l > 1 + e) < exp ^-y(log(l + e) + e) + lognj = S 2 (m, n, e) 

Proof. Clearly X := m\\g\\ 2 is x 2 distributed with degree m and its moment generating function is ¥,(e tx ) 
(1 - 2i)-™/ 2 . From Chernoff bound, 

Pr(X> fl )<^^ 1 - 2 f m/2 
Choosing a = m(l + e) and t = |(1 — m/a) — 2(i+e) ; we nave 

Pr(||g||2 > 1 + e ) < exp (-~(log(l + e) + e) 



2 

The proof then follows by employing the union bound. □ 



Putting Lemmas 3.8 and 3.9 together with Equation (21 1 and taking the expectation with respect to G 
we get, 

P(£i) =E G (¥(£, I G)J r + P(£i I G)/ rc ) 

< exp |- V 32 J ex P{ lo S 2n}(! - *) + * 

er fir) 2 

where T = {G : Ia ^.'^g.^ 2 < (i+e) I an d ^ = ^(^ a n, e) + ^(m, rt, e). Note that P(r c ) < (5 and 6 can be 
made arbitrarily small for m = f2(log(n)) and fc sufficiently large. We are now left to ensure that the first 
term in the RHS of the above equation can be made small as well. For this purpose we need 

(1 - f]) 2 /3 2 SNR ,„ n 

- — , , = (1 + 7 log 2n 23) 

(1 + e 32 V " 8 y ' 
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1 /2 

for some arbitrary 7 > 0. Let r\\ — ( 32 ^ 1+ los 2 " J ■ This implies that it is sufficient that 



1 - 7] > rji ==>■ T] < 1 - 771 (24) 
(1 + e)/(2a)(2 + (1 + e)/(2a)) + 1 < 1 + (1 - »&) (25) 



=> (1 + (1 + e)f{2a)f <2-m => (1 + e)/(2a) < ^2^ - 1 (26) 

For this inequality to be satisfied we need 771 < 1. A sufficient condition for support recovery can be obtained 
by substituting for 77 and we get 

_ 32(l+ 7 )(l + 6)log(2n) n 1 /— — 2 



since (^2^+ y/2H 2 (2a)) 2 < 6H 2 (2a) and 7, e can be made arbitrarily small, the result now follows for 
event E\. 

We are now left to bound the probability of event £2- This case is simple since the normalizing factor 
maxj ||gj||2 is no longer relevant as seen from the proof of Lemma 3.7 It suffices to ensure that crG,min needs 
to be bounded away from zero. However, note that we already have this from bounding the probability of 
event £\. The result now follows. 



4 Recovery for Arbitrary Distortions: Bayesian signal model 

In this section we switch to a Bayesian signal model from the worst-case setting considered in the previous 
section. There are a number of reasons for considering such a model: 

(A) For both the input and output noise models we need the SNR to scale as 0(log(n)) for exact support 
recovery regardless of the number of measurements. 

(B) For exact support recovery in the worst-case setup we require that the minimum singular values of all 



sub-matrices of G as described in Equation (19) be uniformly bounded away from zero (Theorem 3.2 1. This 
arises because a worst-case signal, X, matched to the smallest singular value can be chosen. However, this 
problem may not arise in the average case setting. 

(C) The situation is worse for the input noise model. Even with SNR of f2(log(n)) the number of measure- 
ments required is linearly proportional to signal dimension. 



(D) Theorem 3.4 points out that even with distortion we can only hope to reduce the SNR but not the 
number of measurements. 

Consequently, it is worth exploring whether these results can be improved in the average Bayesian case. 
Fundamentally, the idea is that if we remove a sufficiently small set of signals then it is conceivable that the 
results could be more promising. 

In the following we first develop novel lower and upper bounds to probability of error subject to a 
distortion in reconstruction. The main ingredient in realizing these bounds is the use of the minimal covering 
property of the rate distortion function. We begin with a minimal cover as a functional mapping of the source 
to the set of rate distortion quantization points. Then for the lower bound to the probability of error we 
follow the steps of the proof Fano's inequality, [18] which we appropriately modify to address detection of the 
correct quantization point corresponding to the true X. Similarly for the upper bound to the probability of 
error we propose a minimum distance decoder (ML decoder for AWGN noise) over the set of rate distortion 
quantization points and derive a closed form result for the particular case of £2 distortion. 



4.1 Lower bound- modified Fano's inequality 

In the following we will use X and X n interchangeably. The main reason for introducing this notation 
is that we will deal with n-dimensional probability distributions over X induced by the product measure 
P X n — p x x ... x Px{n times). 

Lemma 4.1. Given observation(s) Y for the sequence X n = {X\ : . . . , X n } of random variables drawn 
IID with Xi r~j P x . Let X n {Y) be the reconstruction of X n from Y. Let the distortion measure be given by 
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d(X n , X n (Y)) = y~]j—i d(Xi, Xi(Y)). Then given e > for sufficiently large n we have 

R x (d ) - K(d a ,n) - ±l(X n ;Y) | ^ 
Rx{d ) + e 



-d(X n {Y),X n )>do > 



where K(do,n) is the logarithm of the number of neighbors of a quantization point in the n-dimensional 
rate- distortion mapping) and Rx(do) is the corresponding (scalar) rate distortion function for X . 

We have the following result for the special case of finite alphabets with Hamming distortion. 

Lemma 4.2. Given observation(s) Y for the sequence X n = {X\, X n } of random variables drawn 
i.i.d. according to Px and Xi £ X, \X\ < oo. Let X n (Y) be the reconstruction of X n from Y. For hamming 
distortion du{-, ■) and for distortion levels, 

d Q < min i 1/2, (\X\ - 1) min P x (at) 



1 4(I»,r ( Y))>J > nR xido )-I(X«;Y)-l-lo g n do 
n 



n\og{\X\) - n (ff a (do) + dolog(|Af| - 1) + ^) 



4.2 Constructive upper bound to probability of error for £ 2 distortion 

In this section we will provide a constructive upper bound to the probability of error in reconstruction 
subject to an average squared distortion level for the output noise model. To this end assume that we 



are given a minimal do cover as described in Theorem 8.1 of |21j . Specifically, we have a set of balls, 
Bi C W 1 , i = 1, 2 . . . , 2™(- Rx ( d °)+ c ), of diameter 2y/nd such that, for any e > we have for sufficiently large 
n that, 

N e (n,do) 

Pr{ (J B t }>l-e 

i=l 

where Rx(d ) is the (scalar) rate distortion function for X ~ f x and N e (n,d ) = 2"(- R *( d °)+ e ). Each ball 
Bi is represented by a quantization points Zj = Zf. Thus with high probability for any X there exists a 
point, Zj to which it can be mapped to such that the distortion is less than d Q. 

We consider a modified maximum likelihood estimator to establish an achievable upper bound. Given G 
and the rate distortion points Zj, we enumerate the set of points, GZj G W n . Then given the observation 
Y we map it to the nearest point GZj £ M mxl . Our estimator X(Y) then outputs Zj. We refer to Figure 
[2] for an illustration. 

Lemma 4.3. Given observation Y = GX + for the sequence X = X n = {Xi, . . . , X n } of random 

variables drawn IID with Xi ~ Px- Let X n (Y) be the reconstruction of X n from Y. Then for any e > we 
have for sufficiently large n, 

P(||X(Y)-X|| 2 >2nd ) < (1 - e) exp ^_ SNR W G & Z | 2 nR x (d ) + £ (2?) 

where Zj and Zj are any two quantization points such that ||Zj — 7ij\\ — Ay/ nd$ . 

Proof. To compute the probability of error we first consider a pairwise error probability, namely, 

P e (i,i) =P{N:XeB, ->Z 3 - | > 27id , G} (28) 

where, d(Bi,Bj) is the minimum squared distance between any two points, Xj £ £>j and Xj £ Bj. Under the 
minimum distance estimator we have, 



N N 

F e (i,j) = P \ N : ||GX + ^== GZ 4 || 2 > ||GX + - GZJ 2 |> (2!)) 
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covering the typical set 



Figure 2: Figure showing the rate distortion cover by balls B of radius \* r ndo. The ML decoding over the set of 
rate distortion quantization points (identified as centers of the distortion balls) consists of mapping Y to the correct 
distortion ball for X using a minimum distance decoder. Shown in the figure is a pair-wise error event for mapping 
X € Si to quantization point Zj G Bj that is at a set distance of 2ndo from Bi to which X belongs. 



where we have omitted the conditioning variables and equations for brevity. Simplifying the expression inside 
the probability of error we get that, 

™ / x f N T GfZj-Zi) ||G(X-Z 7 -)|| 2 - IIGrX-Zi)!! 2 ! , . 

In other words we are asking for the pairwise probability of error in mapping a signal that belongs to the 
distortion ball Bi to the quantization point Zj of the distortion ball Bj under the noisy mapping GX + N 
such that the set (squared) distance between the distortion balls is > 2ndo, see Figure [2] 

Under the assumption that the noise N is an AWGN noise with unit power in each dimension, its 
projection N onto the unit vector ||q(z J -z')|| is also AWGN with unit power. Thus we have 

P(l rt-pf N > l|G(X-Z J )||^-||G(X-Z I )| 

e[,3) ~ XVSNR- 2||G(Z i -Z i )|| 

. P J N . . ||G(X-Z J )|| 2 -||G(X-Z.Q|| 

< ™ > , > mm fi — .„ =— T . 

" \VSNR~ X ^ 2||G(Z J -Z ! )|| 

where we have further upper bounded the probability of the pairwise error via choosing the worst case X 
that minimizes the distance between the ball Bi and the quantization point Zj and maximizes the distance 
from the quantization point Zj within the distortion ball Bi. 

For the case of squared distortion and covering via spheres of average radius do, it turns out that the 
worst case X is given by X = 3Z ' 4 fZj and ||Zj — Zj\\ = 4:\/nd . Plugging this value in the expression we have 
for the worst case pairwise probability of error that 

( v^VR||G(Z i - Zj)\\ \ f SNR\\G(Zi-Zj) 



t {i,j)<¥iN>- ^ } <exp 
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where the second inequality follows by the standard upper bound to the error function. Now we apply 
the union bound over the set of rate distortion quantization points Zj minus the set of points that are the 



18 



neighbors of Zj (see figure [2]). The maximum number of such points is given by N e (n,do) = 2 n( - Rx ^ d °^ +e \ 
where Rx(do) is the scalar rate distortion function, [TS]. Hence we have, 

P(||X - X|| 2 > 2nd | X G \jBi) < exp j- SNR \\ G & X 2 n{R x (4 )+e) 

i *• * 

with || Zj — Zj|| = 4^/ndt). To finish the proof we note that with probability (1 — e), the signal X belongs to 
one of the balls Bj. Thus taking expectations with respect to X the result follows. □ 



5 Approximate Recovery: Bayesian Bounds 

In this paper we will consider the following mixture model for explicit evaluation of the bounds. 

X, ~ P x = aATOii.o?) + (l-a)M(n ,a 2 ) (31) 

i.e., each component Xi of X is IID Px defined above. It is easy to see that for \i\ = 1, /iq = for Co = 
this mixture model for large enough n results in an approximately k = an sparse sequence. We use ai = 
to model a binary discrete case and o~i = 1 to model a continuous valued case. It is worth pointing out 
that this model has been used previously in several papers, e.g. see [221II1] to probabilistically model sparse 
signals. 



5.1 Discrete X: Support recovery 

It is easy to see that using a binary signal model for X one can address the support recovery problem in the 
Bayesian setting. Under this case X is drawn IID according to, 

P x = a5{X - /3) + (1 - a)S(X) , : a < 0.5 (32) 

where, S(-) is the usual singular measure. Note that it follows from Asymptotic Equipartition Property 
(AEP), see [18] . that asymptotically the 7i-dimensional probability distribution uniformly concentrates on 

the set of exactly fc-sparse sequences E^q 1 ^, i.e. given e > 0,3rt such that Px*> (^{opjj — 1 ~ e - Thus these 
bounds can be compared to the worst-case setup of Section 3 when X G S^ Q ^ , k = an. For this discrete 
case we have the following main results stated in terms of the scalar rate distortion function Rx(do) with 
Hamming distance as the distortion measure. Note that for this case Rx(do) — H2(a) — -^2(^0) : do < a. 

Theorem 5.1. Consider the input noise model of Equation p|) and the binary model for X as described 
above. Then, 

a. Necessity: Asymptotically as n — » 00 if m < — X ^ °\ there does not exist any algorithm 

0.51og(l + ap z SNR) 

that recovers the signal to within an average Hamming distortion of do . 



Sufficiencii: Asymptotically as n — » 00, it is sufficient that m > X ^ /„)„ ITr , for the con- 

jj y y f y jj - Q 51og(1+ dofpsim ) 3 

structive ML estimator of section \4-2\ to reliably recover the signal to within Hamming distortion of 
do- 



Proof. To prove part (a) note that from Lemma 4.2 for the probability of error to approach zero implies that 
the numerator in the lower bound approach zero. This implies that we need, 



IL < ^( X ' Y I G ) 



< m \ 1 a (33) 

— t-> / 7 \ 1 log na v J 



To this end recall that Y = G(X+ ^J=N). Consider the SVD of G = USV*, where U, V are orthonormal 



matrices and S = [D 0], with D a positive diagonal matrix. From |20j it follows that U, S, V are independent 
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random matrices. Furthermore U and V are isotropically random. By linearly transforming Y by pre- 
multiplying by D _1 U* we get an equivalent system of equations with 

y = v:x+ = v:n (34) 

where is the matrix formed from the first m rows of V*. Now note that since the rows of VJ are 
orthogonal and normalized N = ^== V|N is IID Gaussian with each component having zero mean and 

variance 1/SNR. This transformation implies that I(X; Y|G) = I(Y;X|Vi) since V is independent of U 
and S. Now by direct computation it follows that, 

E V I(Y; X I Vi) < h(Y | Vi) - h(Y | V, X) < j log(l + SNRa/3 2 ) 

where to get the last inequality we have used the fact that h(Y | V, X) is the entropy of noise N and for 
the first term, h(Y | Vi), we have used the fact that a Gaussian distribution maximizes the entropy over all 
other random variables with zero mean and identical variance 18J. Finally, for sufficiently large n the term 
- + log nd ° can be made arbitrarily small and the result follows. 



We will now prove part (b) . In order to simplify the derivation we again focus on Equation ( 34 ) . Following 



the proof of Lemma 4.3 the pairwise error can now be computed as follows 



,(^)<FU> ^E^^ |V,l<exp(- jggig^M] , (35) 
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To compute the error probability we will need to take the expectation over Vi and apply the union bound 
to bound the error probability over all error patterns. To simplify the expectation over Vi we let, 

OT^ p^-tf } (36) 

where, D is a positive diagonal random matrix independent of V|. Note that our problem reduces to 
bounding expectation of (j>(I m , Vi) over Vi. Note that when cr max (D) < 1 we have 4>(I m , Vi) < </>(D, Vi). 
Next, note that trivially we have, 

^(/ roj Vi))7 {ffmaic(D) < 1} < <^(D ) V 1 )/ {(Tmax(D) < 1} +0(D ) V 1 )/ {(Tmax(D) > 1} (37) 

where Ir.\ denotes the indicator function. Consequently, we can take expectations over the two independent 
matrices D and Vi to obtain, 

EvMIm, V 1 )))Pr 6(<7 max (D) < 1) < £ D)Vl exp |- SNR W DV i^ Z Z ^ j ( 38 ) 

Note that we can introduce a isotropically random unitary matrix U, namely, exp { — ^^||DVj (Zj — Zj)|| 2 } = 
exp {-^||UDV|(Zi - ) 1 1 2 } without modifying the result. Now the matrix H = UDV^ can be identi- 
fied by a suitable IID Gaussian matrix when U, D, V are chosen independently and U and V are chosen 
uniformly from set of all unitary matrices; the positive diagonal matrix D is distributed according to the 
distribution of singular values of a Gaussian matrix. To ensure a tight approximation we need to choose a 
Gaussian matrix such that P(c max (D) < 1) approaches one. This can be accomplished by choosing H as an 

IID Gaussian ensemble with each component ~ jV(0, ^= — ). Then following similar steps as in the 

proof of Lemma |4.3| we arrive at a similar upper bound, 

P(||X - X|| 2 > 2nd ) < (1 - e) exp ^ ^M^ZZ^t X 2 ™(*xW+e) + e 

where, ||Zj — Zj|| = 4^/nrfo- Since e is arbitrary, the result then follows by taking expectation with respect 
to H and using the moment generating function of the x 2 random variable, |23j . □ 
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Theorem 5.2. Consider the output noise model of Equation and the binary model for X as described 
above. Then, 

a. Necessity: Asymptotically as n — > oo ifm < ; ^— °^ there does not exist any alqorithm 

y y y y J ~ 0.51og(l + ^a/3 2 SNR) y y 

that recovers the signal to within an average Hamming distortion of do . 

b. Sufficiency: Asymptotically as n — > oo it is sufficient that m > X ^ °{ for the 

^ JJ ~ 0.51og(l + % do/? 2 SJfii ) 

constructive ML estimator of section \4-S\ to reliably recover the signal to within Hamming distortion of 
do- 



Proof. The proof of part (a) follows along the same lines as that of 5.1 with the following modification to 
the upper bound of the mutual information expression, 

E G I(X;Y|G)<>g(l + ^^) (39) 
2 m 



The proof of part (b) follows from the upper bound to the probability of error in Lemma 4.3 by taking 
expectation with respect to G and using the moment generating function of the x 2 random variable, see 
[23]. □ 

We will now reduce the implicit expression in the above Lemma to derive some explicit conditions on the 
number of measurements m. To this end we have the following corollary. 

Corollary 5.1. Consider the output noise model of Equation and the binary model for X as described 
above. Then, (a) Asymptotically asn->oo if SNR < 2J ^i 2 d °^ and m < 2nRx{do) there exists no algorithm 
that can recover X to within an average Hamming distortion of do; (b) On the other hand asymptotically as 
n — \ oo it is sufficient that SNR > 200J ^j2 ^ with m > 2.08nRx(do/2) for the constructive ML estimator 
of section \4-2\ to recover X to within an average Hamming distortion of do ■ 

Proof. To begin with we will focus on the sufficient conditions. Denote by c = nRx ^ do l 2 ^ m Also let rj = 



dp/3 2 SNR 
2Rx(d /2) ■ 



Then from part (b) of Theorem 



5.2 



we have as a sufficient condition that, 



/(c) = 0.5 log(l + cry) - c > (40) 



In particular we want to find max{c|/(c) > 0}. To this end note that /(c) — at c = 0. Also for there 
to exist any positive c such that /(c) > it is required that rj > 2. In particular r\ > 2 is the condition 
for a positive derivative near zero. This implies that 2flf {da/2) — ^ or $NR > iR ^ d pJ 2 ^ ■ Given that this 
condition is satisfied, c = 1 ~^ r> nes m t nc feasible region. Therefore m = 1 ^ 2 ^ rj nR x (do/2) rj 2nRx(do/2) 
is a sufficient condition for reliable recovery for some sufficiently large rj > 2, i.e. for SNR > 4J? ^° /2) . In 

particular if we choose rj = 100 then SNR > 200 ^jp ^ 2 ^ and m > 2.08nRx(do/2) is sufficient for reliable 
recovery. 



Analyzing part (a) of the Theorem 



5.2 



in a similar manner, one can show that if SNR < 2flx i 2 d °- > and 



77i < 2nRx(do) there exits no algorithm that can reliably recover X to within the desired distortion level. □ 

Remark 5.1. One immediate observation from the above analysis is that unlike the worst case set-up 
one can indeed tradeoff the number of measurements with distortion in the Bayesian set-up. 

5.2 Continuous X: £ 2 recovery 

Under this case X is drawn IID according to, 

P x =aAf(0,B 2 ) + (l-a)d(X) (41) 
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For this case we have the following main results. The results are stated in terms of the scalar rate distortion 
function R x (d ) given by R x (do) = H^a) + § log ^ : d < a, (see section 



8.4 



for the derivation of this 

result). Notice in the following that in contrast to the discrete case where do S a here we impose do < a/2 
and for reasonable reconstruction one typically desires do = ea for some small e > 0. The reason that we 
require do < % is due to the additional term of K (n, do) in the modified Fano's inequality 4.1 which appears 
in the continuous setting. 

Theorem 5.3. Consider the input noise model of Equation and the mixture model for X as described 
above. Then, 

n{R x (d ) - flog 2) 

a. Necessity: Asymptotically as n — > oo if m < — -, r there does not exist any algorithm 

0.51og(l + ap^bNR) 

that recovers the signal to within an average £2 distortion of do ■ 

b. Sufficiency. Asymptotically as n — > 00 it is sufficient that m > X ^ °/ for the con- 

^ ~ 0.51og(l + d °^ NR ) 

structive ML estimator of section \4-%\ to reliably recover the signal to within an average £2 distortion 
of d Q . 



Proof. For part (a) first note that from Theorem 5.1 we have E G I(X; Y|G) < f log(l + (3 2 aSNR). From 



Lemma 4.1 it follows that for feasibility of recovery to with distortion do (asymptotically) it is required that, 

n iE G I(X;Y|G) , , 

_ < —m ! 1 '— (42) 

m " R x {d Q ) - K(d ,n) y 1 

The result then follows by noting that \K(do, n) — 0.5a log 2| < e with e arbitrarily small for large enough n, 
see e.g. [23]. Note that for the case at hand in order for the expression Rx(do) — K(do, n) to remain positive 
and hence meaningful, do < a/2. The proof of part (b) follows exactly along the same lines as the proof of 



part (b) in Theorem 5.1 □ 



Note that unlike the case of support recovery where the number of measurements had to grow with 
signal dimension even with SNR of log(n) here we see that the number of measurements does scale with the 
distortion for moderate signal to noise ratios. This maybe acceptable in cases where either a probability 
model for the signal set is available. 

Theorem 5.4. Consider the output noise model of Equation and the mixture model for X as described 
above. Then, 

n(R x (d )- f log 2) 

a. Necessity: Asymptotically as n — » 00 ifm < — ^ — - — ^gj^j^ there does not exist any algorithm 



that recovers the signal to within an average I2 distortion of do . 



nR x (d /2) 



b. Sufficiency: Asymptotically as n — > 00 it is sufficient that m > , „o„..„ for the 

^ - .51og(l+^ tf " ) 

constructive ML estimator of section \4-S\ to reliably recover the signal to within an average £2 distortion 

of d . 

Proof. The proof is similar to the proof of Theorem |5.3| □ 



It is easy to see that Corollary |5.1| holds true for this case too with appropriate modifications to the 
necessary conditions in terms of Rx(do) — § log2 instead of Rx(do). 



5.3 Comparison between Worst-Case and Bayesian Setups 

Based on the worst-Case and Bayesian results we can comment on the main differences. The situation is 
slightly complicated since we considered two different types of distortions in these cases. We recall the 
items (A) — (D) listed in the beginning of Section [4] as a means for comparison. Note that by adopting 
a Bayesian setup we no longer need that the minimum singular value of sub-matrices of G be uniformly 
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bounded away from zero. This can be attributed to the fact that we are taking expectation with respect to 
G in Equation (27l. However, note that the number of quantization points N e (n,do) in Theorem 8.1 will go 
to infinity if we insist on nearly exact support recovery. Second, note that the measurements do scale with 
the distortiondevel, larger the admissible distortion, smaller the number of measurements. This is even more 
surprising for input noise models since in the worst-case setup we required the number of measurements to 
scale with signal dimension. Finally, for signal reconstruction to within a distortion level do we only need a 
constant SNR in contrast to the worst-case setup. However, this issue can be attributed to the fact that our 
mean-squared distortion metric is less stringent in comparison to support errors. 



6 Appendix 

6.1 Proof of Lemma 13.31 

Consider any arbitrary G and N. Let for each X € 31 denote by Px the observed distribution of Y 
given X as induced by the relation Y = GX + N. We next consider the equivalence class of all sequences 
with the same support and lump the corresponding class of observation probabilities into a single composite 
hypothesis, i.e., 

[X] = {X' G I Supp(X') = Supp(X)} (43) 
Each equivalence class bears a one-to-one correspondence with binary valued k-sparse sequences, 

Hgjjj ={Xe ~f } \Xi = &, ie Supp(X)} (44) 
Our task is to lower bound the worst-case error probability 

P e , G = mm max P x ( [X] + [X] | G) (45) 

X XGH< fc > 

Now note that, 

max P X ([X] ^ [X]|G) > max P X ([X] ^ [X]|G) = max P X (X ^ X, Xe 5 {0 ,/3 } |G) (46) 

XGH^ fc} YcrW Ye- {fc} 

This implies that 



XeL {o,/3> 



G = min max P x {Supp(X) ^ Supp(X)|G} (47) 



XeE« XeE« 



> min max P X (X ± X, X G ~g> a} |G) (48) 



XeS p XeL {o,^} 



max P X (X^X, X G sg> |G) (49) 



> min max P X (X ^ X, X G s{l|G) (50) 
6.2 Proof of Lemma 13.51 

Denote by £i u the error event when a signal from the wth support set is more likely, i.e., 

£i u = <N: min j| Y - G s X s || 2 < min || Y - G s X|| 2 , w/0, > (51) 

w£lX™>/3/2 x J 

In the following we will drop stating the obvious fact that uEl Now note that, 



Si=\J Siu, (52) 
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We first upperbound £\ u by a more manageable event, namely, 



J- W = ^N: min min || Y - G So ^X So ^ ~ G Sqc „X Sqc J| 2 < min || Y - G So X|| 2 , u ± (53) 

I X ™o" ^P/ 2Xs °,» x J 

It is clear that, 

£i u C T„ (54) 

This is because the signal on the common support Sq^ is relaxed to take on any value and not necessarily 
those that are bounded away from zero by /3/2. We will now simplify the events in by analytically 
carrying out the unconstrained minimizations. Recall that Y = GX + N. Let Xg o denote the true signal 
Xo restricted to its support. Then Y = Gs Xg o . Note that Xg o is composed of X° Sq corresponding to the 
overlap and Xq ^ c corresponding to the misses. We have the following Lemma. 

Lemma 6.1. For m > 2k + 1 

J U CJ U = (J {N : 2N T n 1 G'X' > HLLG'X'II 2 , } (55) 

X" in >,3/2,X°' min >B 

where 

^ = (I-Gso.JG^G^J-^^J (56) 

is a projection operator and 



G' = [Gs oeiW G Soi „ e ],X' = 
Proof. Consider the error region, 



, Xg£ w > 0/2, X°; m j > /3 (57) 



F u = lN: min min j| Y - G So ^X So u - G s „. w X So „ J| 2 < min || Y - G So X|| 2 , u ? (58) 
I x v,„>^ 2Xs »,» x J 

Fixing X5 0C u we perform the inner minimization first on the L.H.S in the above equation. It can be shown 
that the inner minimum is achieved at, 

X° So ^ -X S0| „ =-(G^G Soi J- 1 G^(N + Gs ,^X So ^ - G So ^X Soo _ J (59) 

Also the unconstrained minimum on the R.H.S. is given by, 

min||(Y-G 5o X)|| 2 =N T n N (60) 
x 

where 

n = (I-G So (G^G So )- 1 G^ o ) (61) 
is a projection operator. Substituting these results in the expression for we obtain, 

J^=iN: min (G'X') T n 1 G / X / - 2N T II 1 G'X / + N T (n - II^N < \ (62) 

A simple application of the matrix lemma shows that (ILj — III) is a positive semi-definite matrix. This 
implies N T (IIo — IIi)N > 0, VN. Ignoring this non-negative term can only increase the probability of error. 
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Therefore ignoring this term we obtain, 



J r u ={N: min (G'X') 1 IIiG'X' - 2N J LLG'X' < } (63) 
I X W^ /2 J 

C (J {N : (G'X / ) T niG / X / - 2N T niG'X' < 0} (64) 

X^^>/3/2,X So ^ c >/3 

(J {N : 2N T IIiG'X' > ||riiG'X'|| 2 } = T u (65) 

X^^>/3/2,X So ^ c >/3 

where the last equality follows from the fact that III is a projection. Now note that if any column of G' 
falls into the null space of Gg then probability of the event T u is 1 and therefore the probability of error 
is 1 in the worst case. This will not happen as long as G is full rank and m > 2k + 1. □ 

We now have the following Lemma. 
Lemma 6.2. 

L 

f u C C u = U {N : 2N T g ;.X' > ( x min ((G') T G')X' 2 , \X'\ = (3/2} (66) 

where L = |So c ,w USo,u; c | is the total number of location errors and, a m i n ((G') T G') is the minimum singular 
value of the matrix (G') T G'. 

Proof. Let G = LLC. Then note that for any X', 

{N : 2N T GX' > ||G X'|| 2 } C {n : 2N T GX' > a min (G T G)||X'|| 2 } (67) 

Now note that 

where gj is the j-th column of the matrix G and X' = [X[, . . . , X'-, . . . , X' L ] T . Note also that 

||X'|| 2 = ^|^| 2 

3 

By a simple superposition of events this implies that 

L 

IJ |J {N : 2N T g,X; > a min (G T G)|Xj| 2 } 

X»» w >/3/2,X^ c >/3J=l 

C |J (J {N : 2N T ^X> > a min (G T G)\X'j\ 2 } (68) 

X™ in >/3/2.X°„' min >/3J = 1 



C |J {N : 2N T g,X' > a min (G T G)|X'| 2 : \X'\ = [3/2} (69) 

where the last inequality follows from the fact all the events with X' > (3/2 are contained in the event 
X' = (3/2. Now note that since LL is a projection and m > 2k + 1 and L < 2k it implies o- min (G T G) = 
a min {{G') T G'). This implies that, 

L 

K C (J {N : 2N T g,X' > a min (G T G)|X'| 2 : - 0/2} (70) 
= |J [J {N : 2N T g^X' > crG,min((G / ) T G / )|X / | 2 } (71) 

X'=±/3/2 j=l 
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Since L < 2k < n and {gi, .., g<, g' L } = {g, : i € S ", u U Sb,a>4 Q {gi, -,gn}, 

n 

J^C U Q {N : 2N T g 7 X' > a G , min \Xf} (72) 

= Cu (73) 

where er G: mm = mni S: |s|<2/c o- min (Gf G s ). □ 
The result then follows by noting that, 

fi=|J^c|Jj- w c|J/: w (74) 

Ul UJ Ul 

and replacing the notation X' by X. 



7 Proof of Lemma 13.6 

From Lemma |3.5| we have , 



n n 

P(£i)< |J U {N:2N T g,X>a G , nin yMR|X| 2 } = |J Q { W : 2W T E 1 / 2 g J -X > c7 G)IIlin VMR|X| 



X=±P/2j=l X=±/3/2j=l 

X 2 



(J (J < w : 2wX > VSNRa G . m 

X=±P/2j = l I \/gj S gj 



Note that W is IID normally distributed Gaussian vector and we let w = — . „ 3 . Next noting that 
1 1 Sill = 1 Vj we have, 



10 : 2wX > VSNRa G ,mm = \ Q I w : 2wX > V SNRa G , min J - — ^^rX 2 \ (75) 



|gjs gj |i II" ' \ V-(S) 



[w : 2wX > ^cmini/UfS- 1 )! 2 } (76) 



We now apply the union bound over all the possible 2n error events corresponding to each j € {1, 2, n} 
and X = ±/3/2 and obtain, 



i) < P |w : u; > V^toC^y^G.min ^ j exp(log2n) (77) 

= — / exp(-y 2 /2)^ •exp(log2n) (78) 

V27T y^A mijl (S-i)a G ,» ln ^2 

-A min (S~ 1 )cr 2 , !min 32 Uxp{log2n} (79) 

Note that the probability is only taken over the noise W (N) as G is given and is fixed. Here we have used the 
approximation Q(x) < exp(—x 2 /2) for the standard error function defined as Q(x) = -7= exp(— x 2 /2)dx. 



8 Proof of Lemma 3.7 



For any Xo supported on the submatrix G5 the probability of the error event £2 is given by, 

,) -P{N : IKG^GsoJ^G^NlU > sfSNRp/2} (80) 
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To this end let G So = US So V*,U e C mxfc ,V* e C fexm . Then (Gg o Gg ) _1 G|' D = US^V*. Then let 
N = V*N. Then since V is orthonormal matrix N has the same distribution as that of N. Now note that 
if N~ Af(0,E), then 

pjllUE^NlU > SNRP/2] < f^pjlKUS^S^w),!! > ^) (81) 

i=l ^ ' 

( < } 2mexp |_^! AB , n( s-i)^ Soimiii | (82 ) 

< exp < 1 h log 2n > (83) 

where (a) follows from the following facts applied in succession- (1) Maximum variance among the noise 
components (US^S^W), is given by imin A max (S 1/2 ) and A^E 1 / 2 ) = A mi „(Sr 1/2 ); (2) Q(x) < 

g-^ 2 / 2 f or the standard error function defined as Q(x) = ^= exp(— x 2 /2)dx. (b) follows from the fact 
that m < n and cr G ,min < CG So ,min- 

8.1 Proof of Theorem EOl 

We follow along the lines of the proof for the deterministic case presented in Section |3.2| Basically we 
modify Lemma [3~5] We follow the same steps till Lemma [6T2| Then following similar algebraic steps as used 



in Lemma 6.2 it turns out that the support error events with Hamming distortion > 2kdo + 1 are almost 
contained in the union of support error events with Hamming distortion kdo < dn < 2fc<io- Then in this 
case the upper bound in Proposition |3.2| is modified to, 



>e|G < 2exp j-IT 1 )^ / M ° 2 SNR } e ^m,M (g4) 



The result for Gaussian G is then identical to the development in Section 3.3 



8.2 Proof of lemma 14.11 

Let X n = {Xi, . . . , X n } be an IID sequence where each variable Xi is distributed according to a distribution 
Px defined on the alphabet X. Denote Pjn = (Px) n the n-dimensional distribution induced by Px- 
Let the space X n be equipped with a distance measure d(.,.) with the distance in n dimensions given by 
d(X n , Z n ) = Yl=i d ( x k, Z k ) for X n , Z n € X n . For this setting we have the following Theorem taken from 

Theorem 8.1. Given e > 0, there exist a set of points |z™ , Z r ^ ^ ^ j C X n such that, 

(N t (n.do) \ 
U Bij > 1-e (85) 

where Bi = {X™ : *d(X n , Z™) < do} with the property that - log N e (n, do) < Rx(do) + e. This implies that 
for all X n , 3 a mapping f(X n ) : X n -> Z? s.t. P (^d(X n , Zf) < d ) > 1 - e 

Now we are given that there is an algorithm X n (Y) that produces an estimate of X n given the observation 
Y. To this end define an error event on the algorithm as follows, 



E„ 



lif ±d(X n ,X n (Y))>d 
otherwise 



Now, consider the following expansion, 

H(f(X n ),E n , |Y) = H(f(X n )\Y) + H(E n , A n \f(X n ),Y) 
= H(E n \Y)+H(f(X n )\E n ,Y) 
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This implies that 

H(f(X n )\Y)<H(E n )+H(f(X n )\E n ,Y) 

Note that since H(E n ) < 1 

H(f(X n )\Y) < 1 + F e H(f(X n )\Y,E n = 1) + (1 - P e )£T(/(X B )|Y, E n = 0) (86) 

Note that by construction H(f(X n )\Y,E n = 1) < logN e (n,d ) and (1 - P e )F(/(X n )|Y, E n = 0) < 
(1 — P" ) log (|6> |) where S is the set given by, 

S = {i : d set < nd } 

where d set (Si, S 2 ) = Txiin sG s l S / e s 2 d n (s, s') is the set distance between two sets. Now note that H (f(X n )\Y) = 
H(f(X n )) — l(f(X n );Y) > H(f(X n )) -I(X n ; Y) where the second inequality follows from data processing 
inequality over the Markov chain f(X n ) <H- X n o Y. Thus we have, 

p g(/(X"))-log|5|-I(X";Y)-l 
log7V £ (n,d )-log|5| 
> !(/(*"); X")- log |S|-I(X";Y)-1 
nR x (d ) + e 

The proof then follows by noting that by definition of the rate distortion function I(f(X n ); X n ) > nRx(do) 
(see [15] ) and by identifying K(n, d ) = i log |«S|. 

8.3 Proof of lemma [4.21 

Proof. Define the error event, 

E= f lif ItMX«,X«(Y)) >d 
[ otherwise 

Expanding H(X n , E\Y) in two different ways we get that, 

ff(X n |Y) < l + nP e log(|^|) + (l-P e )iI(X"|E = 0,Y) 

Now the term 

n dp- l , v 

(l-P e )iT(X n |£ = 0,Y) < (l-P e )log V ( .)(|A-|-l)" do ^ (89) 

<(l-P e )Iogndof . n .Wl-l)"* (90) 

< »(1 - P e ) (H 2 (d ) + d log(\X\ - 1) + (91) 

where the second inequality follows from the fact that do < 1/2 and (\X\ — l)™* - -? is a decreasing 

function in j for do < 1/2- Then we have for the lower bound on the probability of error that, 



> 



H(X n \Y) - n (R 2 {do) + d log(\X\ - 1) + ^ 



n\og{\X\) - n (H a (do) + d Q log(\X\ - 1) + ^) 
Since H{X n \Y) = H(X n ) - l(X n ; Y) we have 

n (h(X) - H 2 (d ) - d log(\X\ - 1) - - l(X n : Y) - 1 



> 



nlog(|*|) - n (ff a (do) + log(|*| - 1) + ^) 
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It is known that R x (d ) > H(X) - H 2 {d ) - do log (| A? | - 1), with equality iff 

do<(\X\-l)vamP x (x) 

see e.g., [25] ■ Thus for values of distortion do, 



da < min<j 1/2, (\X\ - l)mmP x (x) } (92) 



we have for all n, 

nR x (d ) - I(X n ;Y) - 1 - lognd 
dog(\X\) - n (H 2 (d ) + d Q hg(\X\ - 1) + 



> 

/; ! 



□ 



8.4 Rate distortion function for the mixture Gaussian source under squared 
distortion measure 

It has been shown in [22] that the rate distortion function for a mixture of two Gaussian sources with 
variances given by 0\ with mixture ratio a and <7o with mixture ratio 1 — a, is given by 

Rmix(D) 

H 2 (a) + i*=2l \og(i) + f log(4) if D< al 
H 2 (a) + f log( _ ( ^ a)g g ) if al < D < (1 - a)a 2 + aa\ 

For a strict sparsity model we have ctq — > we have 

R mix {D) = H 2 {a) + f log(Sgi) if < D < aa\ 
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